Skip to content

rapl: raise max sane power limit from 100W to 1000W#539

Open
surban wants to merge 1 commit intointel:masterfrom
surban:fix-rapl-max-sane-power
Open

rapl: raise max sane power limit from 100W to 1000W#539
surban wants to merge 1 commit intointel:masterfrom
surban:fix-rapl-max-sane-power

Conversation

@surban
Copy link

@surban surban commented Mar 18, 2026

Problem

The rapl_max_sane_phy_max constant (100W / 100000000 µW) is used to sanity-check constraint_0_max_power_uw from the powercap RAPL sysfs. When the reported max power exceeds this threshold, thermald falls back to dynamic mode with max_state=0, effectively preventing RAPL-based cooling on non-DPTF platforms.

Modern desktop CPUs regularly exceed 100W max power. For example, Intel 13th/14th gen K-series processors (i9-13900K, i9-14900K) report constraint_0_max_power_uw = 125000000 (125W PBP). This causes the sanity check to trip, setting max_state=0 for the rapl_controller. As a result, thermald cannot reduce PL1 to manage thermals on these systems.

Observed behavior

On a Dell Precision 3660 with i9-13900K running Fedora (kernel 6.19):

  • thermald logs show MX:0 for the rapl_controller cooling device
  • RAPL-based PL1 throttling never activates despite temperatures exceeding the configured trip point
  • CPU reaches 95°C+ with fans at idle RPM because the EC's built-in fan curve has significant thermal lag (designed for Windows DPTF at 70°C)

Fix

Raise the limit to 1000W (1000000000 µW) to accommodate current and future desktop and server processors while still catching genuinely invalid values. Current high-end server parts (e.g. Xeon W9-3595X) already reach 350W TDP.

Testing

Tested on Dell Precision 3660 (i9-13900K, Fedora 43, kernel 6.19.8):

  • With 100W limit: rapl_controller_mmio shows MX:0, no PL1 reduction under load
  • With raised limit: rapl_controller_mmio shows MX:100000000, thermald successfully reduces PL1 from 200W to ~190W, holding CPU temperature around 82-87°C under full 24-core stress

The rapl_max_sane_phy_max constant (100W) is used to sanity-check the
value of constraint_0_max_power_uw read from the powercap RAPL sysfs.
When the reported max power exceeds this threshold, thermald falls back
to dynamic mode with max_state=0, effectively preventing RAPL-based
cooling from working on non-DPTF platforms.

Modern desktop CPUs regularly exceed 100W max power. For example, Intel
13th/14th gen K-series processors (i9-13900K, i9-14900K) report
constraint_0_max_power_uw = 125000000 (125W), which is their base power
(PBP). This causes the sanity check to trip, setting max_state=0 for the
rapl_controller. As a result, thermald cannot reduce PL1 to manage
thermals on these systems.

Raise the limit to 1000W (1000000000 uW) to accommodate current and
future desktop and server processors, while still catching genuinely
invalid values. Current high-end server parts (e.g. Xeon W9-3595X)
already reach 350W TDP.
Copilot AI review requested due to automatic review settings March 18, 2026 21:44
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates thermald’s RAPL sysfs sanity threshold so modern high-power CPUs don’t get incorrectly classified as reporting an invalid constraint_0_max_power_uw, which previously forced a fallback mode that can prevent effective RAPL-based cooling on non-DPTF platforms.

Changes:

  • Increase rapl_max_sane_phy_max from 100W (100,000,000 µW) to 1000W (1,000,000,000 µW).
  • Update the constant’s inline comment to reflect the new intended upper bound.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants